Part II -A presentation of research on Loan data records from prosper loan.¶

by Alexander Yirenkyi¶

Dataset and Investigation Overview¶

This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others. In this project, the analysis will be structured to provide simple univariate relationships to multivariate relationships; the study will address questions like whether or not the monthly loan payment has a correlation to the loan original amount, what is the term of the loan in terms of loan status, identifying the frequency of the categorical variables, such as loan term, borrower's employment status, year of loan, and loan status, are there differences between individual loans depending on how large the original loan amount was?

The report in this section will be organised to give an overview of simple univariate relationships to multivariate relationships. This research provides answers to a number of questions, such as whether the monthly loan payment has a correlation or any relationship with the loan original, amount, what is the spread of the term of the loan in terms of loan status, and identifying the frequency of the categorical variables. This would produce important insights that may be used in a presentation. Although the dataframe contains 81 features, this study is only interested in a select handful of them, so it would be reasonable to reduce the dataframe to the relevant columns. A variety of features would help this study, including the following features, in order to gain a better idea of how this feature of interest would be studied. Original loan amount, loan origination date, monthly loan payment, days since the last payment was made, stated monthly income, investors, and recommendations. To create a new dataframe that may be used as a reference for exploration and analysis, some features in all were collected.

There are values in the loan status that represent past due in several categories of days; these values have been replaced with a single value called "past due" that applies regardless of how many days have passed. The stated monthly income and monthly loan payment variables, which were converted from float to integer for compatibility with the loan amount data type, were not left out of the transformation of the borrower state values from state abbreviation to full text. The object data type of the occupation column was changed to a categorical data type.

Histogram Distibution of Loan Original Amount¶

There are values in the credit status that address past due in a few classifications of days; these qualities have been supplanted with a solitary worth called "past due" that applies paying little heed to how long have passed. The expressed month to month pay and month to month credit installment factors, which were changed over from float to whole number for similarity with the advance sum information type, were not avoided with regards to the change of the borrower state values from state shortening to full text. The item information kind of the occupation section was changed to a downright information type.

Histogram Distibution of Monthly Loan Payment¶

The monthly loan payment is also right-skewed, a case of symmetrical distribution. Most of the monthly loan payment are clustered on the left side of the histogram. The peak of the original loan amount occurs at about 173 dollars, the data spread is from about zero dollars to 2251 dollars.

Calculator of Kernel Density for Loan Original Amount¶

To display an estimate of loan original amount and monthly loan payment will be made using a kernel density estimator.¶

Calculator of Kernel Density for Monthly Loan Payment¶

in order to locate a kernel density estimate data point of the loan's initial amount. specifically, the probability density function of the data points. Due to the possibility of calculating probabilities, densities are helpful. The likelihood that a randomly chosen monthly lona payment will fall between $300 and $500 may be determined from the image below as the area between the density function (graph) and the x-axis in the range [300, 500].

To identy the correlation between the terms of Loan Distribution¶

The visuals below show that loans disbursed on the medium term, in this case 36 months, have the highest occurence with a count of 87778, representing approximately 77 percent of loan term duration, with the remaining 23 percent distributed between the long term (60 months) and short term (12 months) loan durations. This information was used to determine the frequency of the categorical variables term of loan.

Borrower's Employment Status Distribution¶

To identify the frequency of the categorical variables borrower's employment status; it was discovered from the visuals below that those who are employed has the highest occurence in the employment status category with a count of 69557, those who are retired got the lowest occurence in the employment status category, it's more likely to disburse a laon to working class compare to a retired individual.

To Determine Loan Distribution by Year¶

The visuals below were used to determine the frequency of the categorical variables year. It was found that the year 2013 had the highest number of loan disbursements with an occurence of 34345, followed by the years 2012 and 2014, respectively, at second and third position, and that the year 2005 had the fewest disbursements with an occurence of 22.

Line Graph Depicting Relationship Between Monthly Loan Payment and Loan Original Amount¶

To establish any relationship or correlation between the continuous numerical variables; loan original amount, and monthly loan payment it was gathered from the visuals below that a positive correlation between the two variables, as the original loan amount increases the monthly loan payment increase relatively.

Original Loan Amount Grouped by Loan Term Against Current Days of Delinquency¶

It was established from earlier findings that there exists a positive relationship between the loan original amount and monthly loan payment; the data points are dispersed across the scatterplot below, categorised by term of loan, to illustrate the relationship between three variables, two continuous numerical variables (loan original amount and monthly loan payment) and a categorical variable (term).

Original Loan Amount Grouped by Loan Year Against Current Days of Delinquency¶

The data points are dispersed across the scatterplot below, each categorised by the year of the loan, to illustrate the relationship between three variables: the loan original amount, the monthly loan payment, and the year. It was established from earlier findings that there exists a positive relationship between the loan original amount and monthly loan payment.

Heatmap and Correlation Matrix Showing Relationship Between Variables¶

The figure below shows how to illustrate the correlation of numerical variables based on linear properties between variables by plotting a heatmap of a correlation matrix. From the heatmap, we can see that there is a positive correlation between the original loan amount and the monthly loan payment, with a correlation coefficient of 0.93, while there appears to be no correlation between the stated monthly income and the original loan amount.

Text(0.5, 1.0, 'Correlation Matrix Depicting Relationship Between Variable with Heatmap.')
[NbConvertApp] Converting notebook Alexander_Yirenkyi_Project3_Part_II.ipynb to slides
[NbConvertApp] Writing 7406178 bytes to Alexander_Yirenkyi_Project3_Part_II.slides.html
[NbConvertApp] Redirecting reveal.js requests to https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.5.0
Traceback (most recent call last):
  File "C:\Users\User\anaconda3\envs\alxtdata\Scripts\jupyter-nbconvert-script.py", line 10, in <module>
    sys.exit(main())
  File "C:\Users\User\anaconda3\envs\alxtdata\lib\site-packages\jupyter_core\application.py", line 269, in launch_instance
    return super().launch_instance(argv=argv, **kwargs)
  File "C:\Users\User\anaconda3\envs\alxtdata\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
    app.start()
  File "C:\Users\User\anaconda3\envs\alxtdata\lib\site-packages\nbconvert\nbconvertapp.py", line 369, in start
    self.convert_notebooks()
  File "C:\Users\User\anaconda3\envs\alxtdata\lib\site-packages\nbconvert\nbconvertapp.py", line 541, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "C:\Users\User\anaconda3\envs\alxtdata\lib\site-packages\nbconvert\nbconvertapp.py", line 508, in convert_single_notebook
    self.postprocess_single_notebook(write_results)
  File "C:\Users\User\anaconda3\envs\alxtdata\lib\site-packages\nbconvert\nbconvertapp.py", line 480, in postprocess_single_notebook
    self.postprocessor(write_results)
  File "C:\Users\User\anaconda3\envs\alxtdata\lib\site-packages\nbconvert\postprocessors\base.py", line 28, in __call__
    self.postprocess(input)
  File "C:\Users\User\anaconda3\envs\alxtdata\lib\site-packages\nbconvert\postprocessors\serve.py", line 90, in postprocess
    http_server.listen(self.port, address=self.ip)
  File "C:\Users\User\anaconda3\envs\alxtdata\lib\site-packages\tornado\tcpserver.py", line 151, in listen
    sockets = bind_sockets(port, address=address)
  File "C:\Users\User\anaconda3\envs\alxtdata\lib\site-packages\tornado\netutil.py", line 161, in bind_sockets
    sock.bind(sockaddr)
OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted